-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: nunique perf improved by using len(unique) rather than value_counts #9364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can you also time this for an example with NaNs ? |
@@ -440,7 +440,9 @@ def nunique(self, dropna=True): | |||
------- | |||
nunique : int | |||
""" | |||
return len(self.value_counts(dropna=dropna)) | |||
if dropna: | |||
return len(set(self.unique()) - {None}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably failing because we usually represent NA in pandas as np.nan
, not None
It would be nice to also formalize this timing with a new vbench. |
Here are the examples with NaNs:
|
Could you give me an example for the vbench? Should it be in the pandas project or in the vbench project? |
I did not rebase with the last version of master. And this had been done in ff124f9 |
@gtnx sorry about that I forgot this was fixed (and the issue was orphaned). |
Before:
After: